Why Words Alone Are Not Enough: Error Analysis of Lexicon-based Polarity Classifier for Czech
نویسنده
چکیده
Lexicon-based classifier is in the long term one of the main and most effective methods of polarity classification used in sentiment analysis, i.e. computational study of opinions, sentiments and emotions expressed in text (see Liu, 2010). Although it achieves relatively good results also for Czech, the classifier still shows some error rate. This paper provides a detailed analysis of such errors caused both by the system and by human reviewers. The identified errors are representatives of the challenges faced by the entire area of opinion mining. Therefore, the analysis is essential for further research in the field and serves as a basis for meaningful improvements of the system.
منابع مشابه
Subjectivity Lexicon for Czech: Implementation and Improvements
The aim of this paper is to introduce the Czech subjectivity lexicon, a new lexical resource for sentiment analysis in Czech. We describe particular stages of the manual refinement of the lexicon and demonstrate its use in the state-of-the art polarity classifiers, namely the Maximum Entropy classifier. We test the success rate of the system enriched with the dictionary on different data sets, ...
متن کاملCitius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets
This article describes a strategy based on a naive-bayes classifier for detecting the polarity of English tweets. The experiments have shown that the best performance is achieved by using a binary classifier between just two sharp polarity categories: positive and negative. In addition, in order to detect tweets with and without polarity, the system makes use of a very basic rule that searchs f...
متن کاملAnnotate-Sample-Average (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis
The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be exploited to solve the problem are: 1) large amounts...
متن کاملA Supervised Approach for Sentiment Analysis using Skipgrams
We present a supervised hybrid approach for Sentiment Analysis in Twitter. A sentiment lexicon is built from a dataset, where each tweet is labelled with its overall polarity. In this work, skipgrams are used as information units (in addition to words and n-grams) to enrich the sentiment lexicon with combinations of words that are not adjacent in the text. This lexicon is employed in conjunctio...
متن کاملBootstrapping polarity classifiers with rule-based classification
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rulebased classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013